NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

Poddar, Sriyash; Wan, Yanming; Ivison, Hamish; Gupta, Abhishek; Jaques, Natasha (December 2024, Neural Information Processing Systems 2024)

Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning foundation models to human values and preferences. However, current RLHF techniques cannot account for the naturally occurring differences in individual human preferences across a diverse population. When these differences arise, traditional RLHF frameworks simply average over them, leading to inaccurate rewards and poor performance for individual subgroups. To address the need for pluralistic alignment, we develop a class of multimodal RLHF methods. Our proposed techniques are based on a latent variable formulation - inferring a novel user-specific latent and learning reward models and policies conditioned on this latent without additional user-specific data. While conceptually simple, we show that in practice, this reward modeling requires careful algorithmic considerations around model architecture and reward scaling. To empirically validate our proposed technique, we first show that it can provide a way to combat under- specification in simulated control problems, inferring and optimizing user-specific reward functions. Next, we conduct experiments on pluralistic language datasets representing diverse user preferences and demonstrate improved reward function accuracy. We additionally show the benefits of this probabilistic framework in terms of measuring uncertainty, and actively learning user preferences. This work enables learning from diverse populations of users with divergent preferences, an important challenge that naturally occurs in problems from robot learning to foundation model alignment.
more » « less
Free, publicly-accessible full text available December 10, 2025
Backtracking Mathematical Reasoning of Language Models to the Pretraining Data

Razeghi, Yasaman; Ivison, Hamish; Singh, Sameer; Elazar, Yanai (May 2024, Tiny Papers at the International Conference on Learning Representations (ICLR) and Neurips ATTRIB Workshop)

In-context learning and chain-of-thought prompting have demonstrated surprising performance improvements on mathematical reasoning benchmarks. Therefore, understanding the underlying factors enabling these capabilities is crucial. However, the specific aspects of pretraining data that equip models with mathematical reasoning capabilities remain largely unexplored and are less studied systematically. In this study, we identify subsets of model pretraining data that contribute to math reasoning ability of the model, and evaluate it on several mathematical operations (e.g. addition, multiplication) and tasks (e.g. the asdiv dataset). We measure the importance of such subsets by continual training of the model on pretraining data subsets, and then we quantify the change in performance on the mathematical benchmark to assess their importance. If a subset results in an improved performance, we conjecture that such subset contributes to a model's overall mathematical ability. Our results unveil that while training on math-only data contributes to simple arithmetic abilities, it does not solely explain performance on more complex reasoning abilities like chain-of-thought reasoning. We also find that code data contributes to chain-of-thought reasoning while reducing the arithmetic performance.
more » « less
Full Text Available
How far can camels go? exploring the state of instruction tuning on open resources

Wang, Yizhong; Ivison, Hamish; Dasigi, Pradeep; Hessel, Jack; Khot, Tushar; Chandu, Khyathi; Wadden, David; MacMillan, Kelsey; Smith, Noah; Beltagy, Iz; et al (May 2024, Neurips)

Full Text Available
HINT: Hypernetwork Instruction Tuning for Efficient Zero- and Few-Shot Generalisation

https://doi.org/10.18653/v1/2023.acl-long.631

Ivison, Hamish; Bhagia, Akshita; Wang, Yizhong; Hajishirzi, Hannaneh; Peters, Matthew (January 2023, ACL)

Full Text Available

Search for: All records